Building automated genomics data curation and collection pipeline

Investigator(s): Dr Jody Phelan (LSHTM), Gary Napier (LSHTM), Dr Ruby Chang (RVC) and Dr Martin Walker (RVC)

Amount Awarded: £30,500

The project aims to build a pipeline for collection M. tuberculosis NGS data and the application of unsupervised learning methods to characterise the population structure in real-time. The project has three stages:

A number of unsupervised learning techniques will be applied to a large database of publicly available isolate sequences to identify the optimum methods to determine population structure. Techniques will be assessed based on speed, scalability and accuracy.
A backend to the TB-Profiler webserver will be developed to integrate frameworks from aims 1 and 2.
A data protection impact assessment will be performed to in compliance with GDPR. This will aim to characterise how the service interface with user data and minimise potential data security risks. A security policy will be created to ensure that the project developers are knowledgeable on data privacy and security.

back